fix(0.2.1): wire request context + dedup in-flight server analyses#132
Open
fix(0.2.1): wire request context + dedup in-flight server analyses#132
Conversation
Two related defects in `terrain serve`, both flagged by the launch-
readiness review and verified in the codebase:
1. **Request context not wired.** The HTTP handlers ignored
r.Context() and called engine.RunPipeline (which wraps
context.Background()), so a client disconnect during a long
analysis left the analysis running in the background with no
handler waiting on it.
2. **Mutex-blocking analysis cache.** getResult held Server.mu via
defer-Unlock for the full analysis duration. One slow analysis
serialized every other request needing a pipeline result behind a
single goroutine, regardless of whether the cache was warm enough
for them.
Both ship together because the right fix to (2) replaces the cache
mutex with an RWMutex + singleflight, which also gives us a clean
seam for (1):
* Fast path: RWMutex.RLock so warm-cache hits don't contend.
* Slow path: singleflight.Group.DoChan dedups concurrent in-flight
analyses (one analysis per cache window, even with N waiters).
* Per-caller cancellation: each handler threads r.Context() through
getResult; the select on (ch | ctx.Done()) returns ctx.Err()
immediately on disconnect. The shared analysis runs with
context.Background() so a single caller's disconnect doesn't kill
work other waiters depend on.
Tradeoff documented inline: a single-waiter request whose context is
canceled won't (yet) cancel the underlying analysis. Reference-
counting waiters is on the 0.3 list.
Tests added:
* TestGetResult_CacheHit — fast path returns cached pointer
* TestGetResult_RespectsCanceledContext — pre-canceled context
returns context.Canceled within 2s rather than blocking on
analysis (pre-fix this hung until pipeline completion)
* TestGetResult_ConcurrentCallsShareCache — 50 concurrent callers
on a warm cache observe the same report pointer
Dep: golang.org/x/sync v0.10.0 (singleflight). Pinned to v0.10.0
because v0.20.0 bumps the go directive to 1.25; v0.10.0 is
compatible with the existing go 1.23.
Server-package doc-comment refreshed to describe the new
concurrency model and remove the now-fixed "known issues" block
that was added in PR #131.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[RISK] Terrain — Merge with caution
Coverage gaps in changed code
Pre-existing issues (3)
Recommended tests1 test(s) with exact coverage of 6 impacted unit(s). 2 impacted unit(s) have no covering tests in the selected set.
Owners: PMCLSF Limitations
Generated by Terrain · Targeted Test ResultsTerrain selected 1 test(s) instead of the full suite.
|
Terrain AI Risk Review
Decision: PASS — AI surfaces are covered. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the two server defects flagged in the 0.2.0 launch-readiness review. The CHANGELOG block under "Known issues tracked for 0.2.1" can be removed in a follow-up once this lands.
Defect 1 — request context not wired. Handlers ignored
r.Context()and calledengine.RunPipeline(which wrapscontext.Background()). Client disconnects left the analysis running in the background.Defect 2 — mutex-blocking cache.
getResultheldServer.muviadefer Unlock()for the full analysis duration. One slow analysis serialized every other pipeline-needing request behind it.Approach
Both fixes share one structural change:
sync.RWMutex.RLock— warm-cache hits no longer contend with each other or with writers.singleflight.Group.DoChan— concurrent callers wait on a single in-flight analysis (one analysis per cache window, even under load).r.Context()throughgetResult; theselectonch | ctx.Done()returnsctx.Err()immediately on disconnect. The shared analysis runs withcontext.Background()so a single caller's disconnect doesn't cancel work other waiters depend on.Tradeoff
A single-waiter request whose context is canceled won't (yet) cancel the underlying analysis — only the handler returns. Reference-counting waiters and canceling when none remain is on the 0.3 list, documented inline in
server.go.Test plan
TestGetResult_CacheHit— fast path returns the cached pointerTestGetResult_RespectsCanceledContext— pre-canceled context returnscontext.Canceledwithin 2s (pre-fix: hung until analysis completed)TestGetResult_ConcurrentCallsShareCache— 50 concurrent callers on warm cache observe the same report pointergo build ./...,go test ./...,go vet ./...cleanDependency
golang.org/x/sync v0.10.0forsingleflight. Pinned tov0.10.0becausev0.20.0bumps the go directive to 1.25; this is compatible with the existinggo 1.23.🤖 Generated with Claude Code